Automatic 3-Language Cross-Language Information Retrieval with Latent Semantic Indexing
نویسندگان
چکیده
This paper describes cross-language informationretrieval experiments carried out for TREC-6. Our retrieval method, cross-language latent semantic indexing (CL-LSI), is completely automatic and we were able to use it to create a 3-way EnglishFrench-German IR system. This study extends our previous work in terms of the large size of training and testing corpora, the use of low-quality training data, the evaluation using relevance judgments, and the number of languages analyzed.
منابع مشابه
Automatic Cross-Language Retrieval Using Latent Semantic Indexing
We describe a method for fully automated cross-language document retrieval in which no query translation is required. Queries in one language can retrieve documents in other languages (as well as the original language). This is accomplished by a method that automatically constructs a multilingual semantic space using Latent Semantic Indexing (LSI). Strong test results for the cross-language LSI...
متن کاملAutomatic Cross-Language Information Retrieval using Latent Semantic Indexing
We describe a method for fully automated cross-language document retrieval in which no query translation is required. Queries in one language can retrieve documents in other languages (as well as the original language). This is accomplished by a method that automatically constructs a multi-lingual semantic space using Latent Semantic Indexing (LSI). We present strong preliminary test results fo...
متن کاملIndexing Audio Documents by using Latent Semantic Analysis and SOM
This paper describes an important application for state-of-art automatic speech recognition , natural language processing and information retrieval systems. Methods for enhancing the indexing of spoken documents by using latent semantic analysis and self-organizing maps are presented, motivated and tested. The idea is to extract extra information from the structure of the document collection an...
متن کاملCross - lingual Information Retrieval Model based on Bilingual Topic Correlation ⋆
How to construct relationship between bilingual texts is important to effectively processing multi-lingual text data and cross language barriers. Cross-lingual latent semantic indexing (CL-LSI) corpus-based doesnot fully take into account bilingual semantic relationship. The paper proposes a new model building semantic relationship of bilingual parallel document via partial least squares (PLS)....
متن کاملExplicit Versus Latent Concept Models for Cross-Language Information Retrieval
The field of information retrieval and text manipulation (classification, clustering) still strives for models allowing semantic information to be folded in to improve performance with respect to standard bag-of-word based models. Many approaches aim at a concept-based retrieval, but differ in the nature of the concepts, which range from linguistic concepts as defined in lexical resources such ...
متن کامل